Protein Structures and Information Extraction from Biological Texts: The PASTA System

نویسندگان

Robert J. Gaizauskas

George Demetriou

Peter J. Artymiuk

Peter Willett

چکیده

MOTIVATION The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. RESULTS We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date. AVAILABILITY PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Applications of Information Extraction to Biological ScienceJournal Articles : Enzyme Interactions and Protein

Information extraction technology, as deened and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of information from scientiic journal papers in the area of ...

متن کامل

Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Information extraction technology, as defined and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of information from scientific journal papers in the area o...

متن کامل

Automatically Extracting Enzyme Interaction and Protein Structure Information from Biological Science Journal Articles

With the explosive growth of scientific literature in the area of molecular biology, the need to automatically process and extract information from on-line text sources has become increasingly important. In this paper we consider the application of Information Extraction (IE) technology to the extraction of factual information from biological journal papers. IE has proved successful at extracti...

متن کامل

Utilizing text mining results: The Pasta Web System

Information Extraction (IE), defined as the activity to extract structured knowledge from unstructured text sources, offers new opportunities for the exploitation of biological information contained in the vast amounts of scientific literature. But while IE technology has received increasing attention in the area of molecular biology, there have not been many examples of IE systems successfully...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Bioinformatics

دوره 19 1 شماره

صفحات -

تاریخ انتشار 2003

Protein Structures and Information Extraction from Biological Texts: The PASTA System

نویسندگان

چکیده

منابع مشابه

Two Applications of Information Extraction to Biological ScienceJournal Articles : Enzyme Interactions and Protein

Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Automatically Extracting Enzyme Interaction and Protein Structure Information from Biological Science Journal Articles

Utilizing text mining results: The Pasta Web System

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

عنوان ژورنال:

اشتراک گذاری